智能论文笔记

Time Is MattEr: Temporal Self-supervision for Video Transformers

Sukmin Yun , Jaehyung Kim , Dongyoon Han , Hwanjun Song , Jung-Woo Ha , Jinwoo Shin

分类：计算机视觉 | 人工智能 | 机器学习

2022-07-19

了解视频的时间动态是学习更好的视频表示的重要方面。最近，由于其能力捕获了输入序列的长期依赖性，因此对基于变压器的架构设计进行了广泛的探索。但是，我们发现这些视频变压器仍然有偏见地学习空间动力学而不是时间动力学，而伪造的虚假相关性对于它们的性能至关重要。根据观察结果，我们设计了简单而有效的自我监督任务，以便视频模型更好地学习时间动态。具体而言，对于借鉴空间偏见，我们的方法将视频框架的时间顺序作为额外的自我设计，并强制执行随机洗牌的框架以具有低信心的输出。此外，我们的方法还学习了连续帧之间视频令牌的时间流动方向，以增强与时间动力学的相关性。在各种视频动作识别任务下，我们证明了我们的方法的有效性及其与最先进的视频变压器的兼容性。

translated by 谷歌翻译

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Wonyoung Shin , Jonghun Park , Taekang Woo , Yongwoo Cho , Kwangjin Oh , Hwanjun Song

分类：机器学习 | 计算机视觉

2022-07-01

了解产品内容的视觉和语言表示对于电子商务中的搜索和推荐应用程序至关重要。作为在线购物平台的骨干，受到代表学习研究的最新成功的启发，我们提出了一个对比度学习框架，该框架使用未标记的原始产品文本和图像来对齐语言和视觉模型。我们介绍了我们用来培训大规模代表性学习模型的技术，并共享解决特定领域挑战的解决方案。我们使用预先训练的模型作为多种下游任务的骨干进行研究，包括类别分类，属性提取，产品匹配，产品聚类和成人产品识别。实验结果表明，我们所提出的方法在每个下游任务中均优于单个模态和多种方式的基线。

translated by 谷歌翻译

ReFine: Re-randomization before Fine-tuning for Cross-domain Few-shot Learning

Jaehoon Oh , Sungnyun Kim , Namgyu Ho , Jin-Hwa Kim , Hwanjun Song , Se-Young Yun

分类：计算机视觉

2022-05-11

跨域很少的学习（CD-FSL）最近几乎没有目标样本在源和目标域之间存在极端差异，最近引起了极大的关注。对于CD-FSL，最近的研究通常开发了基于转移学习的方法，该方法预先培训了受欢迎的标记源域数据集的神经网络，然后将其传输到目标域数据。尽管标记的数据集可以为目标数据提供合适的初始参数，但源和目标之间的域差异可能会阻碍目标域上的微调。本文提出了一种简单而功能强大的方法，该方法在适应目标数据之前将源域上拟合的参数重新传递。重新运行重置源预训练模型的特定于源特异性参数，从而促进了目标域上的微调，从而改善了几乎没有射击性能。

translated by 谷歌翻译

FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

SangMook Kim , Wonyoung Shin , Soohyuk Jang , Hwanjun Song , Se-Young Yun

分类：机器学习

2022-05-03

鲁棒性正成为联合学习的另一个重要挑战，因为每个客户的数据收集过程自然都伴有嘈杂的标签。但是，由于客户的数据异质性和噪音的不同程度，这加剧了客户到客户的性能差异，因此它更加复杂且具有挑战性。在这项工作中，我们提出了一种名为FedRn的强大联合学习方法，该方法利用具有高数据专业知识或相似性的K邻居邻居。我们的方法仅通过一组选定的干净示例训练，通过其结合混合模型确定，有助于减轻低绩效客户端之间的差距。我们通过对三个现实世界或合成基准数据集进行广泛评估来证明FedRN的优势。与现有的强大训练方法相比，结果表明，在嘈杂标签的存在下，联邦烷可显着提高测试准确性。

translated by 谷歌翻译

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

Steven Euijong Whang , Yuji Roh , Hwanjun Song , Jae-Gil Lee

分类：机器学习

2021-12-13

软件2.0是软件工程的根本班次，机器学习成为新软件，由大数据和计算基础设施供电。因此，需要重新考虑软件工程，其中数据成为与代码相提并论的一流公民。一个引人注目的观察是，80-90％的机器学习过程都花在数据准备上。没有良好的数据，即使是最好的机器学习算法也不能表现良好。结果，以数据为中心的AI实践现在成为主流。不幸的是，现实世界中的许多数据集是小，肮脏，偏见，甚至中毒。在本调查中，我们研究了数据收集和数据质量的研究景观，主要用于深度学习应用。数据收集很重要，因为对于最近的深度学习方法，功能工程较小，而且需要大量数据。对于数据质量，我们研究数据验证和数据清洁技术。即使数据无法完全清洁，我们仍然可以应对模型培训期间的不完美数据，其中使用鲁棒模型培训技术。此外，虽然在传统数据管理研究中较少研究偏见和公平性，但这些问题成为现代机器学习应用中的重要主题。因此，我们研究了可以在模型培训之前，期间或之后应用的公平措施和不公平的缓解技术。我们相信数据管理界很好地解决了这些方向上的问题。

translated by 谷歌翻译

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Hwanjun Song , Deqing Sun , Sanghyuk Chun , Varun Jampani , Dongyoon Han , Byeongho Heo , Wonjae Kim , Ming-Hsuan Yang

分类：计算机视觉 | 机器学习

2021-10-08

变形金刚正在改变计算机视觉的景观，特别是对于识别任务。检测变压器是对象检测的第一个完全结束的学习系统，而视觉变压器是用于图像分类的第一个完全变压器的架构。在本文中，我们集成了视觉和检测变压器（Vidt）以构建有效和高效的物体探测器。 VIDT引入了重新配置的注意模块，将最近的Swin变压器扩展为独立对象检测器，然后是计算高效的变压器解码器，该解码器利用多尺度特征和辅助技术来提高检测性能，而无需多大增加计算负载。 Microsoft Coco基准数据集上的广泛评估结果表明，VIDT在现有的基于变压器的对象检测器中获得了最佳的AP和延迟折衷，并且由于大型型号的高可扩展性而实现了49.2AP。我们将在https://github.com/naver-ai/vidt发布代码和培训的型号

translated by 谷歌翻译

Learning from Noisy Labels with Deep Neural Networks: A Survey

Hwanjun Song , Minseok Kim , Dongmin Park , Yooju Shin , Jae-Gil Lee

分类：机器学习 | 计算机视觉 | (统计)机器学习

2020-07-16

深度学习在大量大数据的帮助下取得了众多域中的显着成功。然而，由于许多真实情景中缺乏高质量标签，数据标签的质量是一个问题。由于嘈杂的标签严重降低了深度神经网络的泛化表现，从嘈杂的标签（强大的培训）学习是在现代深度学习应用中成为一项重要任务。在本调查中，我们首先从监督的学习角度描述了与标签噪声学习的问题。接下来，我们提供62项最先进的培训方法的全面审查，所有这些培训方法都按照其方法论差异分为五个群体，其次是用于评估其优越性的六种性质的系统比较。随后，我们对噪声速率估计进行深入分析，并总结了通常使用的评估方法，包括公共噪声数据集和评估度量。最后，我们提出了几个有前途的研究方向，可以作为未来研究的指导。所有内容将在https://github.com/songhwanjun/awesome-noisy-labels提供。

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Mingzhe Li , Xiuying Chen , Weiheng Liao , Yang Song , Tao Zhang , Dongyan Zhao , Rui Yan

分类：自然语言处理

2023-01-03

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

translated by 谷歌翻译

Deep Spectral Q-learning with Application to Mobile Health

Yuhe Gao , Chengchun Shi , Rui Song

分类： (统计)机器学习 | 机器学习

2023-01-03

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

translated by 谷歌翻译